Job Market Analysis

An overview of Lombardia job market

Data

 • Activated Contracts from Regione Lombardia

 • Ceased Contracts from Regione Lombardia

 • ATECO Code

 • GeoJSON Lombardia

In [27]:
# Retriving all datasets from online sources if we don't have them
import os
import requests

path = "./datasets/"

datasets = [
    {# Activated Contracts Dataset
        'url' : 'https://dati.lombardia.it/api/views/qbau-cyuc/rows.csv?accessType=DOWNLOAD',
        'filename' : 'Rapporti_di_lavoro_attivati.csv'
    },
    {# Ceased Contracts Dataset
        'url' : 'https://www.dati.lombardia.it/api/views/nwz3-p6vm/rows.csv?accessType=DOWNLOAD',
        'filename' : 'Rapporti_di_lavoro_cessati.csv'
    },
    {# ATECO Code Dataset
        'url' : 'https://www.istat.it/it/files//2022/03/Struttura-ATECO-2007-aggiornamento-2022.xlsx',
        'filename' : 'Struttura-ATECO-2007-aggiornamento-2022.xlsx'
    },
    {# GeoJSON border coordinates of Italy provinces from github
        'url' : 'https://dati.lombardia.it/api/views/qbau-cyuc/rows.csv?accessType=DOWNLOAD',
        'filename' : 'limits_IT_provinces.geojson'
    },   
]

# Create datasets dir
if not os.path.exists(path):
    os.mkdir(path)

# Retrive datasets
for dataset in datasets:
    if not os.path.exists(path + dataset["filename"]):
        r = requests.get(dataset["url"], allow_redirects=True)
        open(path + dataset["filename"], 'wb').write(r.content)

Data Cleaning

 • Load Datasets

 • Remove Outliers

 • Join with ATECO Code Datasets

 • Manage null values

In [52]:
df_lav_att, df_lav_ces, df_ateco, df_geojson = load_datasets()
In [53]:
plot_age_distribution(df_lav_att)
In [54]:
plot_age_distribution(df_lav_ces)
In [55]:
plot_time_distribution(df_lav_att, "Activated Contracts")
plot_time_distribution(df_lav_ces, "Ceased Contracts")

Join with ATECO Code Datasets

In [57]:
df_lav_att.groupby(['SETTOREECONOMICODETTAGLIO'])['SETTOREECONOMICODETTAGLIO'].count().to_frame()
Out[57]:
SETTOREECONOMICODETTAGLIO
SETTOREECONOMICODETTAGLIO
Acquacoltura in acqua di mare salmastra o lagunare e servizi connessi 949
Acquacoltura in acque dolci e servizi connessi 545
Affari esteri 723
Affittacamere per brevi soggiorni case ed appartamenti per vacanze bed and breakfast residence 8729
Affitto di aziende 342
... ...
Trattamento igienico del latte 984
Trivellazioni e perforazioni 1599
Università popolare 71
Utilizzo di aree forestali 1277
Villaggi turistici 906

1224 rows × 1 columns

In [59]:
df_lav_att = pd.merge(df_lav_att, df_ateco, how="left", left_on="SETTOREECONOMICODETTAGLIO", right_on="DescrizioneAteco")
df_lav_ces = pd.merge(df_lav_ces, df_ateco, how="left", left_on="SETTOREECONOMICODETTAGLIO", right_on="DescrizioneAteco")
df_lav_att.groupby(['MacroAteco', 'MacroDescrizione'])['MacroAteco'].count().to_frame()
Out[59]:
MacroAteco
MacroAteco MacroDescrizione
A AGRICOLTURA SILVICOLTURA E PESCA 191464
B ESTRAZIONE DI MINERALI DA CAVE E MINIERE 6122
C ATTIVITÀ MANIFATTURIERE 1683362
D FORNITURA DI ENERGIA ELETTRICA GAS VAPORE E ARIA CONDIZIONATA 11992
E FORNITURA DI ACQUA RETI FOGNARIE ATTIVITÀ DI GESTIONE DEI RIFIUTI E RISANAMENTO 30274
F COSTRUZIONI 537945
G COMMERCIO ALLINGROSSO E AL DETTAGLIO RIPARAZIONE DI AUTOVEICOLI E MOTOCICLI 976827
H TRASPORTO E MAGAZZINAGGIO 467976
I ATTIVITÀ DEI SERVIZI DI ALLOGGIO E DI RISTORAZIONE 1264749
J SERVIZI DI INFORMAZIONE E COMUNICAZIONE 713625
K ATTIVITÀ FINANZIARIE E ASSICURATIVE 66430
L ATTIVITÀ IMMOBILIARI 25336
M ATTIVITÀ PROFESSIONALI SCIENTIFICHE E TECNICHE 420783
N NOLEGGIO AGENZIE DI VIAGGIO SERVIZI DI SUPPORTO ALLE IMPRESE 869718
O AMMINISTRAZIONE PUBBLICA E DIFESA ASSICURAZIONE SOCIALE OBBLIGATORIA 87960
P ISTRUZIONE 849114
Q SANITÀ E ASSISTENZA SOCIALE 296441
R ATTIVITÀ ARTISTICHE SPORTIVE DI INTRATTENIMENTO E DIVERTIMENTO 315916
S ALTRE ATTIVITÀ DI SERVIZI 213803
T ATTIVITÀ DI FAMIGLIE E CONVIVENZE COME DATORI DI LAVORO PER PERSONALE DOMESTICO PRODUZIONE DI BENI E SERVIZI INDIFFERENZIATI PER USO PROPRIO DA PARTE DI FAMIGLIE E CONVIVENZE 348738
U ORGANIZZAZIONI ED ORGANISMI EXTRATERRITORIALI 1423

Manage null values

In [60]:
# Activated Contracts
print("Activated Contracts")
missing = df_lav_att.isnull().sum()
percent_missing = df_lav_att.isnull().sum() * 100 / len(df_lav_att)
missing_value_df = pd.DataFrame({ 'Missing': missing, 'Missing %': percent_missing })
missing_value_df
Activated Contracts
Out[60]:
Missing Missing %
DATA 0 0.000000
GENERE 0 0.000000
ETA 0 0.000000
SETTOREECONOMICODETTAGLIO 2881 0.030169
TITOLOSTUDIO 568 0.005948
CONTRATTO 0 0.000000
MODALITALAVORO 439235 4.599527
PROVINCIAIMPRESA 0 0.000000
ITALIANO 0 0.000000
CodAteco 169570 1.775682
DescrizioneAteco 169570 1.775682
MacroAteco 169570 1.775682
MacroDescrizione 169570 1.775682

Manage null values

In [61]:
# Ceased Contracts
print("Ceased Contracts")
missing = df_lav_ces.isnull().sum()
percent_missing = df_lav_ces.isnull().sum() * 100 / len(df_lav_ces)
missing_value_df = pd.DataFrame({ 'Missing': missing, 'Missing %': percent_missing })
missing_value_df
Ceased Contracts
Out[61]:
Missing Missing %
DATA 0 0.000000
GENERE 0 0.000000
ETA 0 0.000000
SETTOREECONOMICODETTAGLIO 1014 0.027346
TITOLOSTUDIO 411 0.011084
CONTRATTO 0 0.000000
MODALITALAVORO 0 0.000000
PROVINCIAIMPRESA 0 0.000000
ITALIANO 0 0.000000
CodAteco 66950 1.805542
DescrizioneAteco 66950 1.805542
MacroAteco 66950 1.805542
MacroDescrizione 66950 1.805542

Data Exploration

In [63]:
plot_diff_att_ces(df_lav_att, df_lav_ces)

Economic Sector Distribution

In [64]:
plot_ateco_live(df_lav_att)

Geographic Distrution - Activated Contracts

In [65]:
plot_geo_data(df_lav_att, df_geojson, "oranges")

Geographic Distrution - Ceased Contracts

In [66]:
plot_geo_data(df_lav_att, df_geojson, "blues")

Prediction

 • Train and Test Data

 • Statistical Models

 • ML & DL Models

 • Model Evaluation

Train and Test Data

In [67]:
dataset = prepare_dataset(df_lav_att, df_lav_ces)
ts_actces, train, test = get_train_test(dataset, "2009-01", "2016-12", "2017-01", "2019-12")
plot_train_test(train, test)

ARIMA / SARIMA

In [68]:
arima_model = arima(ts_actces, train, test)
MAPE - Mean Absolute Percentage Error: 0.174280

Prophet

In [69]:
prophet_model = prophet(train, test)
22:46:24 - cmdstanpy - INFO - Chain [1] start processing
22:46:25 - cmdstanpy - INFO - Chain [1] done processing
MAPE - Mean Absolute Percentage Error: 0.186392

SVR

In [70]:
svr_model = svr(train, test)
MAPE - Mean Absolute Percentage Error: 0.083226

MLP Regressor

In [71]:
mlp_model = mlp(train, test)
MAPE - Mean Absolute Percentage Error: 0.123779

LSTM

In [72]:
lstm_model = lstm(train, test)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-72-0841f7f7e10c> in <module>
----> 1 lstm_model = lstm(train, test)

<ipython-input-51-a2be28bcfc84> in lstm(train, test, timesteps)
    384 
    385     model_lstm = Sequential()
--> 386     model_lstm.add(LSTM(50, activation='relu', input_shape=(timesteps-1, n_features)))
    387     model_lstm.add(Dense(1))
    388     model_lstm.compile(optimizer='adam', loss='mse')

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py in _method_wrapper(self, *args, **kwargs)
    515     self._self_setattr_tracking = False  # pylint: disable=protected-access
    516     try:
--> 517       result = method(self, *args, **kwargs)
    518     finally:
    519       self._self_setattr_tracking = previous_value  # pylint: disable=protected-access

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/engine/sequential.py in add(self, layer)
    206           # and create the node connecting the current layer
    207           # to the input layer we just created.
--> 208           layer(x)
    209           set_inputs = True
    210 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py in __call__(self, inputs, initial_state, constants, **kwargs)
    658 
    659     if initial_state is None and constants is None:
--> 660       return super(RNN, self).__call__(inputs, **kwargs)
    661 
    662     # If any of `initial_state` or `constants` are specified and are Keras

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
    950     if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
    951       return self._functional_construction_call(inputs, args, kwargs,
--> 952                                                 input_list)
    953 
    954     # Maintains info about the `Layer.call` stack.

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list)
   1089         # Check input assumptions set after layer building, e.g. input shape.
   1090         outputs = self._keras_tensor_symbolic_call(
-> 1091             inputs, input_masks, args, kwargs)
   1092 
   1093         if outputs is None:

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs)
    820       return nest.map_structure(keras_tensor.KerasTensor, output_signature)
    821     else:
--> 822       return self._infer_output_signature(inputs, args, kwargs, input_masks)
    823 
    824   def _infer_output_signature(self, inputs, args, kwargs, input_masks):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _infer_output_signature(self, inputs, args, kwargs, input_masks)
    861           # TODO(kaftan): do we maybe_build here, or have we already done it?
    862           self._maybe_build(inputs)
--> 863           outputs = call_fn(inputs, *args, **kwargs)
    864 
    865         self._handle_activity_regularization(inputs, outputs)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent_v2.py in call(self, inputs, mask, training, initial_state)
   1155 
   1156     # LSTM does not support constants. Ignore it during process.
-> 1157     inputs, initial_state, _ = self._process_inputs(inputs, initial_state, None)
   1158 
   1159     if isinstance(mask, list):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py in _process_inputs(self, inputs, initial_state, constants)
    857         initial_state = self.states
    858     elif initial_state is None:
--> 859       initial_state = self.get_initial_state(inputs)
    860 
    861     if len(initial_state) != len(self.states):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py in get_initial_state(self, inputs)
    641     if get_initial_state_fn:
    642       init_state = get_initial_state_fn(
--> 643           inputs=None, batch_size=batch_size, dtype=dtype)
    644     else:
    645       init_state = _generate_zero_filled_state(batch_size, self.cell.state_size,

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py in get_initial_state(self, inputs, batch_size, dtype)
   2505   def get_initial_state(self, inputs=None, batch_size=None, dtype=None):
   2506     return list(_generate_zero_filled_state_for_cell(
-> 2507         self, inputs, batch_size, dtype))
   2508 
   2509 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py in _generate_zero_filled_state_for_cell(cell, inputs, batch_size, dtype)
   2985     batch_size = array_ops.shape(inputs)[0]
   2986     dtype = inputs.dtype
-> 2987   return _generate_zero_filled_state(batch_size, cell.state_size, dtype)
   2988 
   2989 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py in _generate_zero_filled_state(batch_size_tensor, state_size, dtype)
   3001 
   3002   if nest.is_nested(state_size):
-> 3003     return nest.map_structure(create_zeros, state_size)
   3004   else:
   3005     return create_zeros(state_size)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/util/nest.py in map_structure(func, *structure, **kwargs)
    657 
    658   return pack_sequence_as(
--> 659       structure[0], [func(*x) for x in entries],
    660       expand_composites=expand_composites)
    661 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/util/nest.py in <listcomp>(.0)
    657 
    658   return pack_sequence_as(
--> 659       structure[0], [func(*x) for x in entries],
    660       expand_composites=expand_composites)
    661 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py in create_zeros(unnested_state_size)
   2998     flat_dims = tensor_shape.TensorShape(unnested_state_size).as_list()
   2999     init_state_size = [batch_size_tensor] + flat_dims
-> 3000     return array_ops.zeros(init_state_size, dtype=dtype)
   3001 
   3002   if nest.is_nested(state_size):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
    199     """Call target, and fall back on dispatchers if there is a TypeError."""
    200     try:
--> 201       return target(*args, **kwargs)
    202     except (TypeError, ValueError):
    203       # Note: convert_to_eager_tensor currently raises a ValueError, not a

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py in wrapped(*args, **kwargs)
   2817 
   2818   def wrapped(*args, **kwargs):
-> 2819     tensor = fun(*args, **kwargs)
   2820     tensor._is_zeros_tensor = True
   2821     return tensor

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py in zeros(shape, dtype, name)
   2866           # Create a constant if it won't be very big. Otherwise create a fill
   2867           # op to prevent serialized GraphDefs from becoming too large.
-> 2868           output = _constant_if_small(zero, shape, dtype, name)
   2869           if output is not None:
   2870             return output

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py in _constant_if_small(value, shape, dtype, name)
   2802 def _constant_if_small(value, shape, dtype, name):
   2803   try:
-> 2804     if np.prod(shape) < 1000:
   2805       return constant(value, shape=shape, dtype=dtype, name=name)
   2806   except TypeError:

<__array_function__ internals> in prod(*args, **kwargs)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/numpy/core/fromnumeric.py in prod(a, axis, dtype, out, keepdims, initial, where)
   3050     number_of_dimensions : int
   3051         The number of dimensions in `a`.  Scalars are zero-dimensional.
-> 3052 
   3053     See Also
   3054     --------

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     84             # support a dtype.
     85             if dtype is not None:
---> 86                 return reduction(axis=axis, dtype=dtype, out=out, **passkwargs)
     87             else:
     88                 return reduction(axis=axis, out=out, **passkwargs)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in __array__(self)
    853         "Cannot convert a symbolic Tensor ({}) to a numpy array."
    854         " This error may indicate that you're trying to pass a Tensor to"
--> 855         " a NumPy call, which is not supported".format(self.name))
    856 
    857   def __len__(self):

NotImplementedError: Cannot convert a symbolic Tensor (lstm_3/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

Prediction Results

In [ ]:
plot_prediction(df_plot)

Prediction Results

In [ ]:
plot_prediction_and_actual(df_plot)

Conclusions

 • Evaluation on Test data

 • Evaluation on forecasting 2020 - 2021

 • Work with more historical data

 • Cross Validation and Hyperparameter Tuning

Thanks for your attention!